Type Prediction in Noisy RDF Knowledge Bases Using Hierarchical Multilabel Classification with Graph and Latent Features

نویسندگان

  • André Melo
  • Johanna Völker
  • Heiko Paulheim
چکیده

Semantic Web knowledge bases, in particular large cross-domain data, are often noisy, incorrect, and incomplete with respect to type information. This incompleteness can be reduced, as previous work shows, with automatic type prediction methods. Most knowledge bases contain an ontology defining a type hierarchy, and, in general, entities are allowed to have multiple types (classes of an instance assigned with the rdf:type relation). In this paper, we exploit these characteristics and formulate the type prediction problem as hierarchical multi classification, where the labels are types. We evaluate different sets of features, including entity embeddings, which can be extracted from the knowledge graph exclusively. We propose SLCN, a modification of the local classifier per node approach, which performs feature selection, instance sampling, and class balancing for each local classifier with the objective of improving scalability. Furthermore, we explore different variants of creating features for the classifier, including both graph and latent features. We compare the performance of our proposed method with the stateof-the-art type prediction approach and popular hierarchical multilabel classifiers, and report on experiments with large-scale cross-domain RDF datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hierarchical multi-label classification ant colony algorithm for protein function prediction

This paper proposes a novel Ant Colony Optimisation algorithm (ACO) tailored for the hierarchical multilabel classification problem of protein function prediction. This problem is a very active research field, given the large increase in the number of uncharacterised proteins available for analysis and the importance of determining their functions in order to improve the current biological know...

متن کامل

Hierarchical Multilabel Protein Function Prediction Using Local Neural Networks

Protein function predictions are usually treated as classification problems where each function is regarded as a class label. However, different from conventional classification problems, they have some specificities that make the classification task more complex. First, the problem classes (protein functions) are usually hierarchically structured, with superclasses and subclasses. Second, prot...

متن کامل

Neuro-symbolic representation learning on biological knowledge graphs

Motivation Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We deve...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

Multilabel Classification Evaluation using Ontology Information

Multilabel classification using ontology information is an emerging research area that combines machine learning methods with knowledge models. The performance assessment of such classification systems poses new challenges. We propose an evaluation measure that considers the mapping of label sets to their groundtruth and allows for the incorporation of real world knowledge. A distance-based mea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • International Journal on Artificial Intelligence Tools

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2017